Goto

Collaborating Authors

 rotation invariance



e57c6b956a6521b28495f2886ca0977a-Paper.pdf

Neural Information Processing Systems

Attention mechanism has shown great performance and efficiency in a lot of deep learning models, in which relative position encoding plays a crucial role. However, when introducing attention to manifolds, there is no canonical local coordinate system to parameterize neighborhoods.




Hierarchical Direction Perception via Atomic Dot-Product Operators for Rotation-Invariant Point Clouds Learning

arXiv.org Artificial Intelligence

Point cloud processing has become a cornerstone technology in many 3D vision tasks. However, arbitrary rotations introduce variations in point cloud orientations, posing a long-standing challenge for effective representation learning. The core of this issue is the disruption of the point cloud's intrinsic directional characteristics caused by rotational perturbations. Recent methods attempt to implicitly model rotational equivariance and invariance, preserving directional information and propagating it into deep semantic spaces. Yet, they often fall short of fully exploiting the multiscale directional nature of point clouds to enhance feature representations. To address this, we propose the Direction-Perceptive Vector Network (DiPVNet). At its core is an atomic dot-product operator that simultaneously encodes directional selectivity and rotation invariance--endowing the network with both rotational symmetry modeling and adaptive directional perception. At the local level, we introduce a Learnable Local Dot-Product (L2DP) Operator, which enables interactions between a center point and its neighbors to adaptively capture the non-uniform local structures of point clouds. At the global level, we leverage generalized harmonic analysis to prove that the dot-product between point clouds and spherical sampling vectors is equivalent to a direction-aware spherical Fourier transform (DASFT). This leads to the construction of a global directional response spectrum for modeling holistic directional structures. We rigorously prove the rotation invariance of both operators. Extensive experiments on challenging scenarios involving noise and large-angle rotations demonstrate that DiPVNet achieves state-of-the-art performance on point cloud classification and segmentation tasks. Our code is available at https://github.com/wxszreal0/DiPVNet.


we propose a more general framework that can also be adopted to orient whole objects and perform rotation-invariant

Neural Information Processing Systems

Moreover, as recently shown in Bai et al . in "D3Feat: Joint Learning of Dense Detection We will add this information to the revised version. As suggested, we will use only the term orientation . We will modify it in the final version of the paper. We agree that the domain of Spherical CNNs feature maps is key and we will better highlight it in the final version. Since we seek for one rotation, the loss function in (6) is applied once, and only to the last layer of the network.


We have read and appreciate all comments, due to page limit we will only address a subset of questions/concerns

Neural Information Processing Systems

We have read and appreciate all comments, due to page limit we will only address a subset of questions/concerns. We do not expect this will significantly change our performance. R1: Can you evaluate the coarseness of the approximation? By sampling 1000 random singular values with an L2 norm < 50 we get the following results. R4: Approximation = loss not necessarily convex: This is true.




Review for NeurIPS paper: Rotation-Invariant Local-to-Global Representation Learning for 3D Point Cloud

Neural Information Processing Systems

Weaknesses: The idea of estimating and relying on local reference frame to achieve rotation invariance has been explored before in similar context, thus might downgrade the novelty of this paper. For example, "A-CNN: Annularly Convolutional Neural Networks on Point Clouds, CVPR'19" uses the local point set to estimate the normal as this paper does, the difference is that A-CNN uses this normal to project 3d points into 2d plane, however, the basic idea of them is both to achieve locally rotation invariance. "Relation-Shape Convolutional Neural Network for Point Cloud Analysis, CVPR'19" mentioned in their experiments about rotation invariance that they construct a local reference frame to achieve rotation invariant representation of local point set which is the same as this paper. The randomized technique is also a common technique in training deep networks for exploring a larger data space or parameter space. The whole hierarchy is identical to PointNet .